Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings

نویسنده

Sara Noeman

چکیده

Everyday the newswire introduce events from all over the world, highlighting new names of persons, locations and organizations with different origins. These names appear as Out of Vocabulary (OOV) words for Machine translation, cross lingual information retrieval, and many other NLP applications. One way to deal with OOV words is to transliterate the unknown words, that is, to render them in the orthography of the second language. We introduce a statistical approach for transliteration only using the bilingual resources released in the shared task and without any previous knowledge of the target languages. Mapping the Transliteration problem to the Machine Translation problem, we make use of the phrase based SMT approach and apply it on substrings of names. In the English to Russian task, we report ACC (Accuracy in top-1) of 0.545, Mean F-score of 0.917, and MRR (Mean Reciprocal Rank) of 0.596. Due to time constraints, we made a single experiment in the English to Chinese task, reporting ACC, Mean F-score, and MRR of 0.411, 0.737, and 0.464 respectively. Finally, it is worth mentioning that the system is language independent since the author is not aware of either languages used in the experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transliteration using phrase based SMT approach on substrings

Translation of named entities (NEs), such as person names, organization names and location names is crucial for cross lingual information retrieval, machine translation, and many other natural language processing applications. Newly named entities are introduced on daily basis in newswire and this greatly complicates the translation task. Named Entities translation between languages having diff...

متن کامل

Integrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System

The system presented in this paper is based upon a phrase-based statistical machine transliteration (SMT) framework. The SMT system’s log-linear model is augmented with a set of features specifically suited to the task of transliteration. In particular our model utilizes a feature based on a joint source-channel model, and a feature based on a maximum entropy model that predicts target grapheme...

متن کامل

English-Korean Named Entity Transliteration Using Statistical Substring-based and Rule-based Approaches

This paper describes our approach to English-Korean transliteration in NEWS 2011 Shared Task on Machine Transliteration. We adopt the substring-based transliteration approach which group the characters of named entity in both source and target languages into substrings and then formulate the transliteration as a sequential tagging problem to tag the substrings in the source language with the su...

متن کامل

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framew...

متن کامل

Hindi Transliteration Using Context - Informed PB - SMT : the DCU System for NEWS 2009

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings

نویسنده

چکیده

منابع مشابه

Transliteration using phrase based SMT approach on substrings

Integrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System

English-Korean Named Entity Transliteration Using Statistical Substring-based and Rule-based Approaches

English-Hindi Transliteration Using Context-Informed PB-SMT: the DCU System for NEWS 2009

Hindi Transliteration Using Context - Informed PB - SMT : the DCU System for NEWS 2009

عنوان ژورنال:

اشتراک گذاری